Chemical Similarity Searching with neural graph matching methods
نویسنده
چکیده
The thesis examines three novel structural similarity methods that employ a network of simple auto-associative neural networks for storing structural information about databases of molecular graphs. This information can be used to discover similarities from a query graph to any of the graphs in the model database. The fast learning and recall ability of the neural network facilitates efficient approximate matching of graphs. The first method is based on the theory of relaxation labelling to be discussed in chapter 4. The consistencies between label pairs for different edge labels are stored within the associative memories. The discrete relaxation version initialises a set of vertex candidate labels from the set of model graphs for each of the query vertices. The support for each of the candidate labels is gauged by summing the number of consistent query neighbour vertices that have at least one candidate label which is consistent with the neighbour candidate vertex. By specifying a minimum required support, candidate labels can be eliminated which are deemed implausible if they do not achieve the minimum threshold. This algorithm is applied to different molecular data sets in chapter 5. The motivation for its application is that it is possible to search for similar molecules to a known drug in the hope that these molecules are going to exhibit similar properties in these assays. The comparison against conventional graph matching techniques reveals a slight deterioration in search effectiveness due to the approximate nature of the neural relaxation algorithm. However, it is often faster than the other methods and different types of encoding mechanisms do not cause the new method to time out as they do in some conventional methods. The second method corresponds to a novel graph clustering technique which is examined in chapter 6. Here, the approximate mean graph of a cluster of similar graphs is examined on a data set of small images and on molecular databases. Since the computation of the mean graph is computationally expensive, the neural relaxation algorithm from the previous chapter was employed as a pre-processor in order to prune the search space. The approximate mean graph is subsequently used as the input to the similarity search. The results showed a significant increase in the effectiveness in comparison to the search with single reference graphs. The third method denotes a different approach to structural similarity which is discussed in chapter 7. The graphs in the database are decomposed into sets of subgraphs which are stored in a network of binary neural networks. Re-occuring subgraphs are only stored once and are re-used whenever they are encountered. These subgraphs can be treated as features of the graphs and the number of common features between graphs signifies their degree of similarity. Assuming conditional independence of these features, it is possible to apply simple classification algorithms to discriminate between different classes of graphs.
منابع مشابه
Chemical similarity searching using a neural graph matcher
A neural graph matcher based on Correlation Matrix Memories is evaluated in terms of efficiency and effectiveness against two maximum common subgraph (mcs) algorithms. The algorithm removes implausible solutions below a user-defined threshold and runs faster than conventional mcs methods on our database of chemical graphs while being slightly less effective.
متن کاملWeighted Superstructures for Chemical Similarity Searching
Chemical similarity searching forms an important part of the virtual screening process. In this study, we present a graph-based matching method that assembles the target query graph from a number of active molecules. We apply both a weighted and an unweighted maximum common subgraph algorithm to measure the similarities between all molecules in three different data sets and samples of generated...
متن کاملMeasuring Similarity between Graphs Based on the Levenshtein Distance
Graph data has been commonly used and widely researched both in academia and industry for many applications. And measuring similarity between graphs (i.e., graph matching) is the essential step for graph searching, pattern recognition and machine vision. At present, the most widely used approach to address the graph matching problem is graph edit distance (GED). However, the computation complex...
متن کاملEfficient searching and annotation of metabolic networks using chemical similarity
MOTIVATION The urgent need for efficient and sustainable biological production of fuels and high-value chemicals has elicited a wave of in silico techniques for identifying promising novel pathways to these compounds in large putative metabolic networks. To date, these approaches have primarily used general graph search algorithms, which are prohibitively slow as putative metabolic networks may...
متن کاملApplication of 3D Zernike descriptors to shape-based ligand similarity searching
BACKGROUND The identification of promising drug leads from a large database of compounds is an important step in the preliminary stages of drug design. Although shape is known to play a key role in the molecular recognition process, its application to virtual screening poses significant hurdles both in terms of the encoding scheme and speed. RESULTS In this study, we have examined the efficac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006